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Linear ARCH (LARCH) processes were introduced by Robinson [J. Econometrics 47 (1991) 
67-84] to model long-range dependence in volatility and leverage. Basic theoretical properties 
of LARCH processes have been investigated in the recent literature. However, there is a lack of 
estimation methods and corresponding asymptotic theory. In this paper, we consider estimation 
of the dependence parameters for LARCH processes with non-summable hyperbolically decaying 
coefficients. Asymptotic limit theorems are derived. A central limit theorem with y^-rate of 
convergence holds for an approximate conditional pseudo-maximum likelihood estimator. To 
obtain a computable version that includes observed values only, a further approximation is 
required. The computable estimator is again asymptotically normal, however with a rate of 
convergence that is slower than y'n. 

Keywords: asymptotic distribution; LARCH process; long-range dependence; parametric 
estimation; volatility 

1. Introduction 

Since the introduction of ARCH and GARCH processes in the seminal papers of Engle 
(1982) and BoUerslev (1986), an abundance of models with conditional heteroskcdasticity 
have been proposed. More recently, modifications of these models have been introduced 
to include the possibility of slowly decaying correlations (long memory) in volatility. This 
was motivated by the observation that empirical autocorrelations in squared log-returns 
often persist over long stretches of time. Long memory means that the sum of autocor- 
relations over all lags is infinite. As it turns out, not all models proposed in this context 
have long memory in volatility, although their correlations may decay hyperbolically. For 
instance, no second order stationary ARCH(oo) process Xt with non-summable autocor- 
relations of X"^ exists (Giraitis et al. (2000a, 2000b)). Models with genuine long mem- 
ory in volatility include linear ARCH (LARCH) models introduced by Robinson (1991) 
and stochastic volatility (SV) models such as the FIEGARCH process (Harvey (1998), 
Robinson (2001), Surgailis and Viano (2002)). With respect to estimation, SV models 
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are somewhat complicated since they are based on unobservable latent processes. In con- 
trast, no latent process is included in the definition of LARCH processes. This allows 
for direct estimation of unknown parameters, including maximum likelihood estimation 
and related methods. For LARCH processes, the difficulty in studying asymptotics of 
parameter estimates is, however, the rather complex structure of the stationary solution 
(Giraitis et al. (2000a, 2000b)). The problem of location estimation is considered in Beran 
(2006). Related limit theorems can be found in Berkes and Horvath (2003) and Giraitis 
et al. (2000a, 2000b). Here, we will consider estimation of dependence parameters for 
LARCH processes with hyperbolically decaying non-summable weights. 
A LARCH process (Xt,crt)tgz is defined by 

Xt = EtCTt, (1) 

OO 

at^a + Y,b,Xt-,, (2) 

where the following assumptions hold: 

(Al) £t are i.i.d. random variables defined on a probability space (£7,^, P), with con- 
tinuous distribution, E{et) = 0, and £'(£<) = 1; 
(A2) a=^0 and & = Ej°li^'<l- 

The stationary solution of the LARCH equations is given by 

oo oo 

at^a + a^ ^ hj, ■ ■ -bj^et-j, ■ ■ ■ St-j, 

(Giraitis et al. (2000a, 2000b)). Obviously, the process {Xt)t£i is uncorrelated. Giraitis 
et al. (2003) showed that if bj ^j-^oo cj'^~^ for some d £ (0, ^) and E[Xf) < oo, then 
there is long memory in volatility characterized by 

7^(A:) =cov(cro,crfc) ci\k\^'^-'^ 

I A: I — >■ OO 

and 

7x^(fc)=cov(X2,X,2) ^ C2\k\^''-\ 

\k\^oo 

and the same is true for the leverage covariance jLik) = cov(ct^,Xo). 

The main purpose of our work is to provide statistical theory for the estimation of 
a parametric version of (1) and (2). Thus, we assume a and (bj)j>i to depend on a 
finite-dimensional parameter vector 9. We will focus on conditional maximum likelihood 
estimation, a method often used for models with conditional heteroskedasticity. Under 
the assumption of Gaussian St , the following approximate maximum likelihood estimator 
of 9 can be defined: 

9* :— argminL* (0), 
see 
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where 

" V-2 

and 

CXD 

Given a finite sample, (Tt(0) has to be replaced by a proxy atiO), depending on the 
finite past only. Since, in general, St is not assumed to be normal, 0* is called a 
pseudo-maximum likelihood estimator (PMLE). In the case where {Xt,at) is the orig- 
inal ARCH(l) or GARCH(1,1) process, the asymptotic properties of 6** have been in- 
vestigated in Lee and Hanson (1994) and Lumsdainc (1996), and were generalized to 
GARCH(p, g) and ARCH(oo) processes by Berkes et al. (2003) and Robinson and ZafFa- 
roni (2006), respectively. For long- memory LARCH processes, derivation of asymptotic 
results is more complicated because the coefficients bj arc not summable. Moreover, cr^ 
may become arbitrarily small and hence and its derivatives arbitrarily large. The first 
problem leads to difficulties with respect to differentiability of at {9) as a function of 9. Ad- 
ditional assumptions on the parametric model are therefore needed (see Section 2). The 
second problem can be avoided by modifying the original maximum likelihood equations 
(see Section 3). Also, note that parametric estimation for finite order LARCH processes, 
that is, where the sum in (2) is finite and thus the autocorrelations of the squares are 
absolutely summable, is considered in Francq and Zakoian (2008) and Truquet (2008). 

The outline of the paper is as follows. Section 2 deals with ergodicity and differen- 
tiability as necessary prerequisites. Estimation of 9 is considered in Section 3. Asymp- 
totic results arc derived for two versions of a modified MLE: (a) estimate with (Jt{9) 
(t = 1, . . . ,n) and (b) estimate including only values of at{9) that can be approximated 
with sufficient accuracy. Lemmas needed in the proofs of the main results can be found 
in the Appendix. A small simulation study in Section 4 illustrates the theoretical results. 
Some general comments in Section 5 conclude the paper. 

2. Ergodicity and differentiability 
2.1. Ergodicity 

To ensure consistency, ergodicity of at is needed. The following proposition is an extension 
of Theorem 2.1 in Giraitis et al. (2003). 

Proposition 1. Under (Al) and (A2), there exists a unique strictly and second order 
stationary solution of (1) and (2). This solution is ergodic. 
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Proof, at is given by the Volterra decomposition (see Giraitis et al. (2000a, 2000b)) 

oo oo 

at=a + aY^ ^ bj, ■ ■ ■ bj.et^j, ■ ■ ■ Et-j, j,. 

k=lji,...,jk=l 

Since {e^^ • • • ei^}i<ii<...<i^,r>i is an orthonormal system, convergence in the L^(51)-norm 
follows from (A2) since 

oo oo oo 

fc=i ji,...,ifc=i fc=i 

For the uniqueness of at, we refer to Giraitis et al. (2003). For the proof of ergodicity, it 
is sufficient to find a measurable function /:R°° M with at = /(£t_i,£t_2, ■ • where 
equality holds almost surely (see, for example. Theorem 3.5.8 in Stout (1974)). First, note 
that convergence of the infinite sum defining the solution is independent of the order of 
summation since the scries of squared coefficients is absolutely summablc. Hence, we 
make use of the following alternative representation of at . Define 

fk{xi,X2,...)= ^jl---b3l^jl---^jl + ---+Jl 

ji>l,l<k 
jlH hjl=k 

and 

Mt{k) = fkiet-i,et-2:---)- 

Then 

oo 

at = a + ay^Mt(fc) 

k=l 

and for every fixed t G Z, AIt{k),k — 1,2,..., is a martingale difference w.r.t. Tf. = 
a{Mt{l),l < k}. An application of the martingale convergence theorem yields that 

m oo 

5t(m)=^Mt(fc)^^Mt(fc) 

k=l k=l 

as m — >■ c» almost surely. Hence, the desired representation is given by 

oo 

/=E^- 

A;=l 



For the measurability of /, see Corollary 2.1.3 in Straumann (2004). 



□ 
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2.2. Differentiability 

For simplicity of notation, wc will concentrate on coefficients {bj)j>i of the following 
type: 

(Bl) 

b,{c,d) = cf-\ 
where d£ [0,du], < ^, cG [0,c„((i)] and 

with < C < 1. 
(B2) a e [ad, with < < a„ < oo. 

Assumption (Bl) ensures the summability constraint in (A2). Extending the results 
to more general weights, such as, for instance, those obtained from the FARIMA(p, d, q) 
operator (see Grager and Joyeux (1980), Hosking (1981)) is straightforward. For instance, 
we may consider FARIMA(0, d, 0) weights bj defined by 

CJO 

J2b,B^^cid)[il~B)-'^l], 

where < d < ^ and c{d) is a constant such that < 1- Note, in particular, that here 
(1 — B)"'^ instead of (1 — B)'^ induces long memory for d> 0. 

In the following, we will use the notation C [0,^) x (M+)^ for the set of all 9 = 
{d,c,a)'^ such that (Bl) and (B2) hold. Moreover, for a real matrix A, we define the 
matrix norm 

\\A\\=tT{A^A)^/^. 

Convergence of matrices will be understood with respect to this norm. The LARCH 
process {Xt,<7t)tez will be assumed to belong to the parametric family with 6q in the 
interior of O. 

From the given dynamical structure in equation (2), we can reconstruct the unobserv- 
able conditional variance cr^ from the infinite past {Xs)s<t; as follows. Define, for any 
6* e e and < G Z, 

OO 

{e)=a + Y,bjic,d)Xt^,. 

For the process with true parameter 9o, we have, in particular, 

af{0o)=var(Xt\Xs,s<t-l). 
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Given a finite sample {Xt)t=i....,m '^t{S) has to be approximated, for instance, by 

t-i 

at{e)=a + J2b]{c,d)Xt-j, t>l. 

The extent to which this may be a good approximation of <7t{0) will be discussed in 
Section 3. 

We now consider the properties of at {0) for fixed t S Z as a stochastic process with index 
G O. The reason is that almost sm'C continuity and differentiability of crt(0) as a function 
of 9 will be required in the next section. Moreover, we need to ensure measurability of 
infima involving (7t{0) on the uncountable set Q. In the case of absolutely summable 
coefficients (6j)j>i, this is not a problem since the infinite sum defining the stationary 
solution is uniformly absolutely summable, on a set of probability one, and crt(0) inherits 
the properties of bj{c,d). In contrast, for non-summablc bj, this is not automatically the 
case. We therefore impose the following assumption. 

(S) For every t e Z, {<Tt{6))e^Q is a separable stochastic process on 8, that is, for every 
open AcQ and closed interval B, the sets 

{uj\at{e)eB,yeeA} and {a;|crt(6l) e B, Vfl e A n Q^} 

differ only on a set C A^o, where P{Nn) = 0. 

Remark 1. The process {at{9))g^Q can always be replaced by a separable version (see 
Theorem 2.4 in Doob (1953)). 

The following result can now be obtained. 

Proposition 2. Under assumptions (Al), (Bl), (B2) and (S), (Jt{9) is almost surely 
infinitely often differentiate in 9 and the kth partial derivative w.r.t. d is given by 



Proof. Let 

at{d):^at{{lA,df}. 
The covariance function of [(7t{d)]o<d<d^ is given by 

oo 

v{d,d') =Cov{at{d),at{d')) 

which is infinitely often differentiable for all < d, d' < |. Since a and c are just addi- 
tive and multiplicative components, respectively, in at{9), existence of derivatives follows 
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immediately from Lemma 1 (see the Appendix). Indeed, iteration of the fohowing cal- 
culation shows that the partial derivatives w.r.t. d can be calculated as claimed: Taylor 
series expansion of bj{l,d) for each j up to order 2 yields 



E 



1 °g o 

- (at {d+h)~ at {d))-J2 arf^^ (1' ^)^*-. 



h-'E 



& 



i=i 



as ft. — >■ 0, where d< d^ < d,. 



□ 



Lemma 1 in the Appendix also implies that, under (S), we are able to find bounds on 

£;(sup,gektWr) (™> 1) in terms of sup.ge ^(kt(^)l") and sup^ge ^^d^^tl^)!")- 
This is very useful for proving uniform convergence results. 



3. Estimation 

3.1. Estimation with exact conditional variances 

Define 

\li\p = Ei\stn 

and 

oo 

ii&ii^=Ei^ii'- 

The following assumptions ensure the existence of unconditional moments of at and Xt. 
Assumptions (M3), (M4) and (M^) are from Giraitis et al. (2003), while (M^) is from 
Giraitis et al. (2000b). 

1 /3 

(M3) l/ija < 00 and l/ijg ||&(6'o)||3 + 3C||6(^o)||2 < 1, where ( is the positive solution of 

the equation 3^^ — 3C — 1 = 0. 
(Mp Forp>2, |/i|p<oo and {2p ^ p - iy/^\^i\l/P\\b{ea)\\2 < I- 
(M;') For cvenp>4, l^^lp < 00 and E-=2 0\\KW,\l^j\ < 1- 

Remark 2. For even p>4:, {M^) is weaker than (M^). For Gaussian (and similar) et, 
(M3) is weaker than (M3). We will therefore make use of assumption (M^) only if either 
p 5 or (M3) is weaker than (M3). The assumptions we will use are only sufficient; more 
general (but complicated) conditions can be formulated in terms of the moments of at (0) 
and its derivatives. 
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First, we will assume that crt{9) can be calculated exactly, that is, as if we knew the 
infinite past {Xs)s<n- To avoid the problem of unbounded cr^"^ (see Section 1), we modify 
the maximum likelihood estimator as follows. Let 

t=i t=i t V ; 

where e > is a small but positive constant, and define the estimator 

e^i) :=argminL„(0). 

see 

Furthermore, denote by L{9) = E[lt{9)] the expected value of the individual terms in L„. 
Consistency is given by the following result. 

Theorem 1. Let e > and assume that (Al), (Bl), (B2) and (S) hold. Then, under 
(M3) or (M3), ^^i^"* is a strongly consistent estimator oJOq, that is, as 



Proof. From Lemmas 3 and 4 (see the Appendix), we get uniform a.s. convergence of 
Ln{9) to the function L{9). Moreover, L{9) has a unique minimum at ^o- The proof then 
follows from standard arguments (see, for example, Huber (1967)). □ 

The asymptotic distribution of 9^^"^ is essentially determined by L'j^{9o), where 
Define the matrices 



89 ^'\d9 "'J J V {(^t+f? {(^t+^) 



. ,„ A 4a? . 



where 



The Hessian matrix Qggg, h{9) is given explicitly in the proof of Lemma 3 in the Appendix. 
The asymptotic distribution of O^n^ can now be derived as follows. 
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Theorem 2. Let e > and 9q be in the interior of Q. Then, under assumptions (Al), 
(Bl), (B2), (S), (M^), 

_0o)47V(o,i/-iG,iI-i) 

as n oo, where N{0, S) denotes the three-dimensional centered normal distribution with 
covariance matrix E. 

Proof. By Taylor series expansion, 
with 

t=l 

evaluated in each row j ~ 1, 2, 3 at some point 9 = 6^ with \\9i - 9o\\ < plP - 9a\\ . Since 



Tt-i = 0, 



where J^t = f (£s, s <t), -^lt{9o) is a vector of stationary, ergodic martingale differences 
with finite variance. Hence, from Theorem 23.1 in Billingsley (1968) and the Cramer- 
Wold device, 

as n — > cx). From Lemma 3 and Proposition 1, we get 

almost surely as n — >■ oo. By Lemma 5, is invertible. This, together with Slutsky's 
theorem, concludes the proof. □ 

Remark 3. Letting e tend to zero, we get H-^G,H^^ -> {Eej - l)ffo"\ where Hq = 

■ T 

4E{ '^*J/ )■ If Eia^"^) = oo, this means, for instance, that the asymptotic variance of a 
approaches zero. 



Remark 4- Formally, we get the same rate of convergence and asymptotic variance as 
for short memory models, such as GARCH(p, and ARCH(oo) (see Berkes et al. (2003) 
and Robinson and Zaffaroni (2006)). 
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3.2. Estimation given the finite past 

Given a finite sample Xi , . . . , Xn , the computable version of the estimator is defined by 



:=argminL„(6'), 
see 



where 



n ■ 



This estimator is consistent in the following sense. 

Theorem 3. Let e > and assume that (Al), (Bl), (B2) and (S) hold. Then, under 
(Ms) or (M^), 

as n — >■ oo, where convergence holds in and in probability. 

Proof. The proof follows as for Theorem 1 , with the additional application of Lemma 
6. □ 

(2) 

Obtaining the asymptotic distribution of 9n is more complicated due to the slow 
convergence of \crt{0) — d't{0)\ to zero. To be more specific, note that 

oo 

As in the proof of Theorem 2, Taylor series expansion yields 

where -£^(0) and L[[{9) are the same as L'^{0) and L'^{0) with crt{9) replaced by at{0). 
Since the law of large numbers still holds (see Lemma 6), the asymptotic distribution 
of Oil follows from the asymptotic distribution of L'^{9q). The latter is the same as for 
L'n{Oo), provided that 

as n — > cx). Since dn is asymptotically equivalent to 
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applying the mean value theorem to (x^ + e) ^ and taking into account the asymptotic 
behavior of E[{at{9) — at{9))'^], a rough upper bound for i^ddnl) is given by 



ciE{ 



In the long-memory case with d>0, this bound does not converge to zero. We therefore 
propose an alternative estimator, at the cost of a slower rate of convergence: for given 
< /3 < 1, define m{n) = [n^J — 1, where [-J denotes the floor function, 



Ln{0) := 



m{n) 



E 

t—n—ra{n) 



ln(a?(0) 



and 



e'i^ :=argminL„(0). 

see 



This estimator has the following properties. 



Theorem 4. Let e>{), 6q he in the interior of Q and assume (Al), (Bl), (B2) and (S). 
The following then hold: 

(a) if (M3) or (M3) holds and < /? < 1, then O^f^ converges in and in probability 
to 00 ; 

(b) if (Mg) holds and < l3 < 1 — 2d, then as n-^ 00, 

(c) if (M3) or (MJj) holds and P^l- 2d, then 



Proof. The proof is a combination of Theorem 2 and the arguments given above. □ 

Remark 5. The choice of e is important for a good performance of the estimator o'ifK 
While the above theorems indicate that e should be chosen as small as possible, the 
optimization in the definition of oif"^ becomes numerically more demanding if e — > since 
the function L„ may then exhibit many local minima. As an illustration, in Figure 1, L„ 
is plotted as a function of the single parameter d for different values of e. How this effect 
can be handled statistically and how it depends on the parameter Oq are the subjects of 
current research. 



Remark 6. Calculations analogous to those above imply that for short-memory LARCH 
processes (that is, LARCH processes with absolutely summable autocorrelations of X"^), 
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L(d), eps=0.001 




L(d), eps=0.0001 




L(d), eps=0 



jJUu 



IJ 



Figure 

fixed a - 
value is 



1. For 6 = 0.01,0.001,0.0001 and 0, the function Ln is plotted as a function of d with 
- 1 and c = 0.1. In each plot, the same path of Xt is used, where the true parameter 
9o = (1,0.4,0.1)'^ and n — 2000. The vertical line indicates the true value of d. 



the central limit theorem for 9n^ holds with y^-rate of convergence. This also includes 
the case where d < 0. 

Remark 7. If d > is close to zero, then the best rate of convergence n^/"^ is close 
to n^/^. However, for strong long memory with d close to 1/2, the upper bound for /3, 
given by 1 — 2c?, is very small. Thus, the number of (Tt's used for estimation is very small 
compared to n and the rate of convergence of O^i ^ is very slow. 

Remark 8. Though consistency holds for all /? G (0,1], the asymptotic distribution of 
for j3>l — 2d remains an open problem. The reason for the bound 1 — 2d is that, 
defining 



d„:=V^(Z;(0)-Z„(0o)), 



we have 



i?[|d„|]=0(n^/2+'^-i/2), 

which is o(l) for /3 < 1 — 2d. For /3 = 1 — 2d, the difference is bounded, but it is unclear 
whether or not it converges to zero. 
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Remark 9. Alternative estimates of 9o could be defined via moment estimation. For 
instance, empirical estimates of the first three autocovariances of Xf, 7j(:2 (0), 7^2 (1) 
and 7x2(2), could be used to estimate 9o by the method of moments. Limit theorems 
in Berkes and Horvath (2003) can then be used to show that the resulting estimate is 
asymptotically normal and the rate of convergence is n^/^"''. This is exactly the rate 
obtained for 9^1^^ at the border (5 — l — 2d. 

4. Simulations 

We illustrate Theorem 4 by calculating 9\i^ for simulated LARCH processes with stan- 
dard normal £t and a parametrization such that (Bl) and (B2) hold. The model parameter 
vector 9 and the constants e and /3 are chosen as follows: 

• Case 1: d = 0.1, a = 1, c = 0.2; e = 0.01, /3 = 0.799; 

• Case 2: d = 0.2, a = 1, c = 0.2; e = 0.01, p = 0.599. 

To simulate the process Xt via (1) and (2), a pre-sample of length 10 000 is used 
for initiation. Moreover, the infinite series in (2) is truncated at order 2000. Figures 2a 
and b show typical sample paths of Xt for the two cases. The corresponding sample 
autocorrelation functions of X"^ are given in Figures 2c and d, respectively. 

For simplicity, we focus on the estimation of d only. The asymptotic standard deviation 
given in Theorem 4b (calculated by simulation) is equal to 1.68 in Case 1 and to 1.14 
in Case 2. To compare asymptotic with finite-sample results, a small simulation study is 
carried out as follows. For sample sizes n = 1000, 2500, 5000 and 10000, N = 1000 inde- 
pendent samples of the LARCH process are drawn and the estimator 9'^^'^ is calculated. 
Summary statistics of the results are given in Tables 1 (Case 1) and 2 (Case 2). Normal 
probability plots based on all 1000 simulations are given in Figures 3a-h and 4a-h. 

Comparing the results, one can see a strong discrepancy between robust and non- 
robust estimates of the expected value, standard deviation and skewness of 9if\ The 
robust estimates arc close to the asymptotic values obtained from Theorem 4b, already 
for n = 1000. This is not the case for the non-robust estimates. Most extreme are the 
values of the (non-robust) skewness measure which should converge to zero, but instead 
seem to be increasing in absolute value. This can be explained as follows. Out of = 
1000 simulations, there are a few cases where the algorithm terminated at a solution 
equal, or very close to, the lower end of the parameter range used in the numerical 
minimization (see also Remark 5 and Figure 1). As expected from Theorem 4a (and 
b), the number of cases where this happens decreases with increasing n. However, since 
the variance of estimates in the interior of Q tends to zero with increasing n, those 
few estimates that are equal to the fixed lower limit of the parameter space become 
increasingly extreme outliers, compared to the bulk of the simulated data. Indeed, even 
if N tends to infinity and only one out of A'^ simulations is equal to the lower bound, the 
empirical skewness will not converge to zero. For this reason, the (non-robust) empirical 
standard deviation, skewness and normal probability plot are grossly contaminated by 
the small (and asymptotically negligible) number of simulations where the algorithm 




Figure 2. Two simulated sample paths of a long-memory LARCH process Xt and the corre- 
sponding sample autocorrelation functions of Xf. The long-memory parameter d is equal to 0.1 
in Figures 2a and c, and to 0.2 in Figures 2b and d, respectively. 



did not converge properly. Apart from the robust estimates, we therefore also computed 
the same empirical non-robust quantities leaving out the ten (out of N = 1000) smallest 
values of eO^^ . The non-robust estimates are then indeed much closer to the theoretical 
values, and the normal probability plots indicate convergence (albeit rather slow for 
d = 0.2) to the normal distribution. 

An additional observation we can make is that convergence to the asymptotic distribu- 
tion is slower for stronger long memory {d ~ 0.2). The reason is that for d ~ 0.2. the num- 
ber of terms used in Ln{0) is much smaller, namely 0{n^^^^^), as compared to 0(n°-^^^) 
for d = 0.1. More specifically, for n = 1000, 2500, 5000 and 10000, we have m{n) = 
62, 108, 164 and 248 for d = 0.2, whereas for d =■ 0.1, we have m(n) = 249, 518, 902 and 
1570 for d = 0.1. 



5. Final remarks 



We considered parametric estimation for LARCH processes using a modified conditional 
pseudo-likelihood function. The rate of convergence of the computable version discussed 
in Section 3.2 depends on the strength of long memory. For short-memory processes 
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(b) n=2500 




-1 
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Theoretical Quantiles 
(c) n=5000 



-1 



1 



Theoretical Quantiles 
(d) n=10000 




(f) n=2500 




-1 



1 



Theoretical Quantiles 
(g) n=5000 




Theoretical Quantiles 
(h) n=10000 



-1 1 

Theoretical Quantiles 



-1 1 
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Figure 3. Normal probability plots of A'' = 1000 simulated estimates d„ for Case 1 (Figures 
3a-d) and Case 2 (Figures 3e-h). 
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(a) n=1000 



(e) n=1000 




Theoretical Quantiles 
(b) n=2500 




Theoretical Quantiles 
(c) n=5000 
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(d) n=10000 
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(f) n=2500 
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(g) n=5000 




Theoretical Quantiles 
(h) n=10000 




-1 1 

Theoretical Quantiles 



-1 1 
Theoretical Quantiles 



Figure 4. Normal probability plots of simulated estimates Si'^^ for Case 1 (Figures 4a-d) and 
Case 2 (Figures 4e-li), with ten (out of A*' = 1000) of the lowest points excluded. 
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Table 1. Mean, standard deviation and skewness of 6„ with /3 = 0.599, based on = 1000 
simulated LARCH processes with long- memory parameter d = 0.2 (Case 2). The asymptotic 
standard deviation from Theorem 4(b) is equal to 1.681. Here, s is the empirical standard 
deviation, s is the MAD divided by the 75%-percentile of the standard normal distribution and 
q-skewness is the empirical quartile skewness. In the upper table, all A'^ — 1000 simulated values 
are used; in the lower table, the ten smallest values of Oif^ are excluded 



n 


1000 


2500 


5000 


10000 




d = 0.1: 


all 1000 simulations 




Mean 


0.047 


0.069 


0.085 


0.088 


Median 


0.094 


0.099 


0.104 


0.101 


s 


0.353 


0.290 


0.216 


0.198 




0.121 


0.082 


0.054 


0.041 




5.570 


6.605 


6.490 


7.864 




1.909 


1.859 


1.621 


1.629 


Skewness 


-10.620 


-13.161 


-16.320 


-20.464 


g-skewness 


-0.118 


-0.038 


-0.093 


-0.089 


d = 


0.1: 10 smallest values of d excluded 




Mean 


0.072 


0.091 


0.098 


0.098 


Median 


0.094 


0.101 


0.104 


0.101 


s 


0.150 


0.090 


0.057 


0.043 




0.119 


0.080 


0.053 


0.040 




2.384 


2.063 


1.720 


1.722 




1.882 


1.822 


1.595 


1.604 


Skewness 


-1.199 


-0.747 


-0.684 


-0.422 


g-skewness 


-0.103 


-0.042 


-0.075 


-0.079 



(d < 0), the usual central limit theorem with convergence holds. If, on the other 
hand, d is close to ^, convergence is very slow, so long time scries are needed to obtain 
reliable estimates. In view of the typical range of applications of volatility models, this 
may not necessarily be a problem. For instance, for high-frequency data in finance, the 
sample size n is often close to 100000 or more so that the application of 9\i^ is feasible. 
How far the best rate n^^^"'* may be improved is an open problem. Alternative methods, 
including Whittle estimation and improved approximations of at, are the subjects of 
current research. 



Appendix 



Lemma 1. Let (^('i, w))d6[a,b] &e a real-valued separable stochastic process with mean 
and E{^'^{d)) < oo for all de [a,b]. 
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Table 2. Mean, standard deviation and skewness of 6„ with /3 = 0.599, based on N = 1000 
simulated LARCH processes with long-memory parameter d = 0.2 (Case 2). The asymptotic 
standard deviation from Theorem 4(b) is equal to 1.14. Here, s is the empirical standard de- 
viation, s is the MAD divided by the 75%-percentile of the standard normal distribution and 
q-skewness is the empirical quartile skewness. In the upper table, all A'^ — 1000 simulated values 
are used; in the lower table, the ten smallest values of dif^ are excluded 



n 


1000 


2500 


5000 


10000 




d = 0.2: 


all 1000 simulations 




Mean 


-0.292 


0.059 


0.110 


0.168 


Median 


0.181 


0.201 


0.198 


0.199 


s 


1.395 


0.719 


0.552 


0.255 




0.215 


0.133 


0.102 


0.082 




11.041 


7.489 


7.079 


4.030 




1.703 


1.385 


1.310 


1.291 


Skewness 


-2.761 


-5.800 


-7.899 


-11.752 


g-skewness 


-0.292 


-0.134 


-0.117 


-0.093 


d = 


0.2: 10 smallest values of d excluded 




Mean 


-0.245 


0.110 


0.161 


0.186 


Median 


0.184 


0.202 


0.199 


0.200 


s 


1.319 


0.511 


0.219 


0.114 




0.213 


0.131 


0.101 


0.080 




10.437 


5.319 


2.810 


1.800 




1.688 


1.362 


1.290 


1.262 


Skewness 


-2.949 


-6.831 


-4.829 


-1.336 


g-skewness 


-0.285 


-0.112 


-0.098 


-0.081 



(a) Denote the covariance function of ^ by v{d,d') = E{^{d)^(d')). The following then 
hold: 

(i) Ifv(d,d') is continuous in (d,d'), then {^{d))d£[a,b] measurable, 
ill) Ifv{d,d') is continuously differentiable, then (C('^))c;e[a,6] mean square dif- 
ferentiable, that is, there is a process iCid))d£[a.b] with 



E 



for all dCz [a, 6]. Moreover, for almost alluj, ^'(-jw) coincides with the distri- 
butional derivative dS^{-,uj)/dd. 
(iii) Ifv{d,d') is m times continuously differentiable, then, for almost alluj, ^(-jw) 
is m — 1 times continuously differentiable. 
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(b) If {S.{d))d£[a,b] is mean square difjerentiable with E{\^(d)\"^) < oo and E{\^' {d)\^") < 
oo for m >1, then 

e( sup imr) < Ema)n + mmn 

Me[a,b] ^ 

+ m(6-a) sup {^(1^(^)1™ )}(™-i)/™{^(|e'(d)r)}i/™. 

de[a.b] 

Proof, (a) is from Kunita (1990), page 40, whereas (iii) is essentially an application of 
Sobolev's embedding theorem (see, for example, Adams and Fournier (2003)). (b) is an 
extension of Theorem 3B in Parzen (1965), page 85. □ 

Throughout this appendix, Ki will denote generic finite constants. 

Lemma 2. Let 6*= (6*1, 6*2, 6*3)^ and suppose that (Al), (Bl), (B2) and (S) hold. Then: 
(a) under (M3), (M^ or (M^, we have for k<3 



sup 

\ee0 



de,, ■ ■ ■ ae,. 



< 00, 



where p ~ 3 if (M3) holds; 
(b) under (M3), (M^ or (M^, we have 

E(sup\<7t{e)-atm'' 
where p ~ 3 if (M3) holds. 



as i — >■ (X) , 



Proof. We only give the proof under (M3). The proof is a combination of Lemma lb 
and the combinatorial arguments of Lemmas B.1-B.3 from Giraitis et al. (2003). The 
other cases follow by similar arguments and by using, under (M^), Lemma 3.1 of Giraitis 
et al. (2000b) and, under (M^'), Proposition 2.2 of Giraitis et al. (2003), respectively. 
First, note that 



(0) 



Moreover, at{0) — (7t{0) can be expanded as a Voltcrra series of the type 

OG 

with 



k=l 



(fe) 



ft.i{t~ si)ft.2{si -S2) •■■/*, 2(sfe-i -Sk)esi 



Sk<-<si<t 
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ft,ijt,2 G L^Zl), |l/t,i|l2 < oo and ||/t,2||2 < 1- For (a), we set 



with 



and 



while for (b), 



with 



and 



(/t,2(j))j>l-(&j(0))j>l, 

MS) - MS) = ci^i 

iftAj)),>i^{b,+tm,>i 

{ftAj))o>i^iHS))o>i- 



The proof tlien follows from the application of Lemma lb and the following result. A 
small modification of Lemmas B.l and B.3 in Giraitis et al. (2003) shows that 



and 
where 

A,. = Ia^I3^'!I,A,,I13 + 3C||/mII2 

and C is defined as in assumption (M3). Hence, 



3 . A'.i 



(i-A,2)3 



Since & is compact, we get in (a) that Dt i < Ci and Dt^2 < 1 — ^2, where the constants 
Ci < 00 and < C2 < 1 arc independent of 0. Furthermore, in (b), Dt^i — > as i — s- 00, 
uniformly for all 6* G 8. Note that ||/t,i||2 may be greater than 1 and only ||/f,2||2 < 1 is 
used. □ 
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Lemma 3. Let assumptions (Al), (Bl), (B2) and (S) hold. Then, under (M3) or (M3), 



// (M^') holds, then 



sup |L„(6') — — > a.s. as n—> 00. 
6»ee 



sup 11X^(6') - L'(6i)|l ^0 a.s.asn-^00, 
See 



where L' (9) ^ E{-§glt{0)) . If {M'^) holds, then 



sup \\L'^{e) - L"{9)\\ a.s. as n 00, 
9e0 



(3) 



(4) 



(5) 



where L"{9) = E{ gggg, lti9)). In the three respective cases, L{9) (resp. L'{9),L"{9)) is 
continuous in 9. 

Proof. We first prove (3). From (Bl), we have 

sup E\lti9)\ < K{E{X^) + e) + K sup E{(7^ (9)) < 00. 
ee& Bee 

Thus Ln(9) L{9) by ergodicity of Xf and a't{9) for each individual 6* £ 8. Uniform 
convergence follows from a.s. equicontinuity of {Ln{9))g^Q. From the mean value theorem, 
and the stationarity and ergodicity of ■^lt{9), it suffices to show that 



sup 

\eee 



d_ 

09 



< 00 



(see, for example, Andrews (1992)). Since 

< Ki\ddati9)\Xf + K2, 



we get from Holder's inequality and Lemma 2 that 

a 



sup 

\eee 



89 



<K 



{Ei\Xt\')r^^E( sup \d,a,{dr)} 



1/3 



+ i^2 <00. 



In (4) and (5), pointwise convergence again follows from ergodicity and the particular 
moment assumption. Uniform convergence is also proved as above. Note that the Hessian 
matrix of L„ (9) is given by 
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where 



dedO' 



+ 



1 - 



do 

d_ 

do" 



(6) 



92 



dede- 



Hence, the matrix norm of (9) is dominated by a linear combination of the terms 



sup 

9l£0 



and 



sup 



for i,j G {1,2,3}. Under (M4) and Lemma 2, these are bounded in so that (4) fol- 
lows. Analogously, under (Mg), (5) follows by the L^-boundedness of a similar linear 
combination also involving the terms 



sup 



g3 



at{e)xt 



sup 



and 



sup 



d 



d 



l^MO) — a,{e)—a,i0)X, 



d 



de, 



de 



where «, j, k G {1, 2, 3}, for which Lemma 2 can again be applied. 
Lemma 4. Under (Al), (Bl), (B2) and (S), for every 6 eQ\{eo}, 

Lie)>L{eo). 

Proof. From E{e^) = 1, we get 



□ 



L{9)-L{eo)=E 



-In 



- 1 



Since x — ln{x) — 1 > for 1 7^ .t > 0, we have 

Lie) > L{eo) 
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for all 9 and L{9) ^ L{9o) if and only if cr^(0) = crf{9o) almost surely. Given 9 and 9o 
with af{9) = a-f{9o) a.s., we show 9 = 9o. Thus we define the sets 

A = {Loen\at{9)^at{9o)}, 
A^{coen\at{9)^-ati9o)}. 



and 



Note that 



On An Nt-i, we have 



i=i j=i 



and hence 



1 

{co + c)<Tt-i [ ^ 



The right-hand side is measurable w.r.t. Tt~2 and hence independent of the left-hand 
side. Since St-i has a continuous distribution, this is only possible if 

PiAnNt-i)=0. 

On the sets 



fc-i 

Ak = Af] iV/l, n Nt-k 



i=l 



for k>2, repeat the same arguments for et-k to conclude that P{A) — 0. Note that the 
set {u! £ H\3tQ :crt = for all t < tg} has probability zero, otherwise equation (2) would 
not hold. Consequently, with probability one, at{9) =at{9o), that is, 

oo 

Expectation yields a = uq. Finally, considering the variance yields coj'^°^^ = cj"^^^ for all 
J > 1- □ 

Lemma 5. Under (Al), (Bl), (B2), (S) and (M5), the matrices and are positive 
definite for all 9 d Q. 
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Proof. We only prove that is positive definite. The proof for follows by the same 
arguments. Given A G R"^, we have to show that 



A' HA = E 



4-1 



Assume that there is a A = (Ai, A2, Aa)^ e M'' such that 



almost surely. Then, on the set {uj G 0\crt 7^ 0}, we have 

00 

Ai + ^(Aa/-^ + A3 log(j)j''"i)Xt_, = -A2£t-iat_i. 

J=2 



□ 



By arguments similar to those used in the proof of Lemma 4, we then get A = 0. 
Lemma 6. Let assumptions (Al), (Bl), (B2) and (S) hold. Then, under (M3) or (M3), 

(7) 



// (M^') holds, then 



If (M^) holds, then 



sup|L„(0)-L„(0)|^O 
flee 



as 7T, — > 00 . 



s\vp\\L'^{0) ~ L'^{6)\\ ^ Q asn^Qo. 
See 



sup\\L';{e)~L';i0)\\^O as 00. 
0ee 



(8) 



(9) 



Proof. From the mean value theorem applied to (a;^ + e) ^ and ln(a:; + e), and since the 
derivatives of these functions are bounded, wc get 



1 " 

sup|Z„(0)-i„(0)| < -y\Xf + e\aup 
eee ~i see 



1 



< 



1 " 

+ - ^ sup|ln(a2(0) + e)- Hafie) + e)\ 

n 1 ^ ^ 

- ^ + e| sup |at {9) - (0) | + - ^ sup \at {9) - at (9) \ 
nfr{ flee n^t^ee 
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Then, by Lemma 2b, (M3) or (M3) implies that 

S(sup|at(^?)-at(0)|3) ->0. 

Together with the Cauchy-Schwarz inequahty and Cesaro summabihty, this proves (7). 
The other hmits, (8) and (9), are proved by means of analogous arguments. □ 
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